Line-Level Layout Recognition of Historical Documents with Background Knowledge

نویسندگان

چکیده

Digitization and transcription of historic documents offer new research opportunities for humanists are the topics many edition projects. However, manual work is still required main phases layout recognition subsequent optical character (OCR) early printed documents. This paper describes evaluates how deep learning approaches recognize text lines can be extended to using background knowledge. The evaluation was performed on five corpora prints from 15th 16th Centuries, representing a variety features. While with standard layouts could recognized in correct reading order precision recall up 99.9%, also complex were at rate as high 90% by knowledge, full potential which revealed if pages same source transcribed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Line Extraction from Complex Layout Documents

There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and hist...

متن کامل

Handwritten Text Recognition for Historical Documents

The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...

متن کامل

Integrating Optical Character Recognition and Machine Translation of Historical Documents

Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...

متن کامل

Classification and Recognition of Neume Note Notation in Historical Documents

Neume musical notation is a type of writing of Christian Orthodox Church chant originated in Ancient Byzantium, which is still used by the Orthodox Church. The big varieties of preserved historical documents, which contain neumes give reach material for investigations in fields like history, cultural sciences, etc. This causes the natural need of a software package which can help these investig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2023

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a16030136